TCGA BRCA


TCGA-BRCA data preparation

Download TCGA-BRAC data

library(TCGAbiolinks)

GDCquery link
GDCquery category pdf

# query <- GDCquery(project = "TCGA-BRCA",
#                   data.category = "Transcriptome Profiling",
#                   data.type = "Gene Expression Quantification")
# 
query <- GDCquery(project = "TCGA-BRCA",
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification", 
                  sample.type = c("Primary Tumor","Solid Tissue Normal"))
df = query$results[[1]]
df$sample_type %>% table() %>% data.frame()
# Download and prepare data
GDCdownload(query)
data <- GDCprepare(query)
data %>% saveRDS(paste0(dir,"TCGA_BRCA_transcriptome.rds"))
data = readRDS(paste0(dir,"TCGA_BRCA_transcriptome.rds"))
library(DESeq2)
# 가능한 모든 assays 이름을 확인
assayNames(data)

RNA-sequencing data types

  1. unstranded: Expression data that does not distinguish between the two DNA strands. This type of data aggregates the expression signals from both strands, which is useful when the directionality of transcription is not a concern.

  2. stranded_first: Expression data that is specific to the first strand of the DNA. This type of data allows for the determination of the specific DNA strand from which the RNA was transcribed, enhancing the accuracy of gene expression profiling, especially in areas of the genome with overlapping genes transcribed in opposite directions.

  3. stranded_second: Expression data that is specific to the second strand of the DNA. Like data from the first strand, this helps in accurately mapping RNA reads to their originating strand, providing clear insights into the transcriptional landscape.

  4. tpm_unstrand: Transcripts Per Million (TPM) normalized unstranded expression data. TPM is a normalization method used in RNA sequencing data analysis that accounts for both the depth of sequencing and the gene length, enabling comparison across samples.

  5. fpkm_unstrand: Fragments Per Kilobase of transcript per Million mapped reads (FPKM) normalized unstranded expression data. FPKM normalization takes into account the length of the fragments and the total number of reads, facilitating comparison across genes within a sample but not always between different samples due to potential differences in sequencing depth.

  6. fpkm_uq_unstrand: Upper Quartile normalized FPKM (FPKM-UQ) unstranded expression data. This normalization method adjusts for differences in the distribution of gene expression data, using the upper quartile (top 25% of expressed genes) as a scaling factor. This is particularly useful for minimizing the influence of highly expressed genes and improving the comparability between samples.

# unstranded: Strand 비특이적 발현 데이터
# stranded_first: Strand 특이적 (첫 번째 strand) 발현 데이터
# stranded_second: Strand 특이적 (두 번째 strand) 발현 데이터
# tpm_unstrand: TPM (Transcripts Per Million) 방식으로 정규화된 strand 비특이적 발현 데이터
# fpkm_unstrand: FPKM (Fragments Per Kilobase of transcript per Million mapped reads) 방식으로 정규화된 strand 비특이적 발현 데이터
# fpkm_uq_unstrand: FPKM-UQ (Upper Quartile normalized FPKM) 방식으로 정규화된 strand 비특이적 발현 데이터

Select the raw reads

# 'unstranded' 
counts <- assay(data, "unstranded")

Convert ENSG to Symbols

# ENSG to Symbols
library(org.Hs.eg.db)

# 유전자 ID에서 버전 제거
ensembl_ids <- gsub("\\..*", "", rownames(counts))

# 중복된 ID 처리
unique_ids <- unique(ensembl_ids)

# 중복 제거된 ID에 대해 유전자 심볼 매핑
gene_symbols <- mapIds(org.Hs.eg.db,
                       keys = unique_ids,
                       column = "SYMBOL",
                       keytype = "ENSEMBL",
                       multiVals = "first")

# 원래 데이터에 매핑된 유전자 심볼 할당
# 여기서 중복 제거된 목록을 이용해 각각의 원본 ID에 매핑된 심볼 할당
symbol_names <- gene_symbols[ensembl_ids]
rownames(counts) <- symbol_names
# Metadata 
filtered_colData <- colData(data)[colnames(counts),]
TCGA_BRCA_countsMeta = list(counts = counts,
                            meta = filtered_colData)
TCGA_BRCA_countsMeta %>% saveRDS(paste0(dir,"TCGA_BRCA_countsMeta.rds"))

TCGA BRCA project meta data

  1. barcode: Unique identification code for each sample within the TCGA project.
  2. patient: Unique identifier assigned to each patient.
  3. sample: Unique identifier for the sample.
  4. shortLetterCode: Simple code representing the sample.
  5. definition: Describes the definition of the sample type.
  6. sample_submitter_id: ID of the submitted sample.
  7. sample_type_id: ID of the sample type.
  8. tumor_descriptor: Provides a description of the tumor.
  9. sample_id: Sample ID.
  10. sample_type: Type of sample (e.g., Primary Tumor).
  11. composition: Describes the composition of the sample.
  12. days_to_collection: Days until sample collection.
  13. state: Indicates the state of the sample.
  14. initial_weight: Initial weight of the sample.
  15. preservation_method: Sample preservation method.
  16. pathology_report_uuid: Unique identifier for the pathology report.
  17. submitter_id: Submitter ID.
  18. oct_embedded: Whether included in OCT (Optical Coherence Tomography).
  19. specimen_type: Type of specimen.
  20. is_ffpe: Whether treated with Formalin-Fixed Paraffin-Embedded (FFPE).
  21. tissue_type: Type of tissue.
  22. synchronous_malignancy: Presence or absence of synchronous malignancy.
  23. ajcc_pathologic_stage: AJCC pathological stage.
  24. days_to_diagnosis: Days until diagnosis.
  25. treatments: Treatment details.
  26. last_known_disease_status: Last known disease status.
  27. tissue_or_organ_of_origin: Tissue or organ of origin.
  28. days_to_last_follow_up: Days to last follow-up.
  29. age_at_diagnosis: Age at diagnosis.
  30. primary_diagnosis: Primary diagnosis.
  31. prior_malignancy: Presence or absence of prior malignancy.
  32. year_of_diagnosis: Year of diagnosis.
  33. prior_treatment: Previous treatment details.
  34. ajcc_staging_system_edition: Edition of the AJCC staging system.
  35. ajcc_pathologic_t: AJCC pathological T rating.
  36. morphology: Morphology.
  37. ajcc_pathologic_n: AJCC pathological N rating.
  38. ajcc_pathologic_m: AJCC pathological M rating.
  39. classification_of_tumor: Tumor classification.
  40. diagnosis_id: Diagnosis ID.
  41. icd_10_code: International Classification of Diseases code (ICD-10).
  42. site_of_resection_or_biopsy: Site of resection or biopsy.
  43. tumor_grade: Tumor grade.
  44. progression_or_recurrence: Whether there is progression or recurrence.
  45. alcohol_history: Alcohol history.
  46. exposure_id: Exposure ID.
  47. race: Race.
  48. gender: Gender.
  49. ethnicity: Ethnicity.
  50. vital_status: Vital status.
  51. age_at_index: Age at index.
  52. days_to_birth: Days to birth.
  53. year_of_birth: Year of birth.
  54. demographic_id: Demographic ID.
  55. days_to_death: Days to death.
  56. year_of_death: Year of death.
  57. bcr_patient_barcode: BCR patient barcode.
  58. primary_site: Primary site.
  59. project_id: Project ID.
  60. disease_type: Type of disease.
  61. name: Name.
  62. releasable: Whether releasable.
  63. released: Whether released.
  64. days_to_sample_procurement: Days to sample procurement.
  65. paper_patient: Patient data used in the paper.
  66. paper_Tumor.Type: Tumor type described in the paper.
  67. paper_Included_in_previous_marker_papers: Whether included in previous marker papers.
  68. paper_vital_status: Vital status described in the paper.
  69. paper_days_to_birth: Days to birth described in the paper.
  70. paper_days_to_death: Days to death described in the paper.
  71. paper_days_to_last_followup: Days to last follow-up described in the paper.
  72. paper_age_at_initial_pathologic_diagnosis: Age at initial pathological diagnosis described in the paper.
  73. paper_pathologic_stage: Pathological stage described in the paper.
  74. paper_Tumor_Grade: Tumor grade described in the paper.
  75. paper_BRCA_Pathology: BRCA pathology described in the paper.
  76. paper_BRCA_Subtype_PAM50: BRCA subtype PAM50 described in the paper.
  77. paper_MSI_status: MSI status described in the paper.
  78. paper_HPV_Status: HPV status described in the paper.
  79. paper_tobacco_smoking_history: Tobacco smoking history described in the paper.
  80. paper_CNV Clusters: CNV clusters described in the paper.
  81. paper_Mutation Clusters: Mutation clusters described in the paper.
  82. paper_DNA.Methylation Clusters: DNA Methylation clusters described in the paper.
  83. paper_mRNA Clusters: mRNA clusters described in the paper.
  84. paper_miRNA Clusters: miRNA clusters described in the paper.
  85. paper_lncRNA Clusters: lncRNA clusters described in the paper.
  86. paper_Protein Clusters: Protein clusters described in the paper.
  87. paper_PARADIGM Clusters: PARADIGM clusters described in the paper.
  88. paper_Pan-Gyn Clusters: Pan-Gyn clusters described in the paper.



TCGA-BRCA Analysis

TCGA_BRCA_countsMeta = readRDS(paste0(dir,"TCGA_BRCA_countsMeta.rds")) 

treament = TCGA_BRCA_countsMeta$meta$treatments

Create DESeq object

count.raw = TCGA_BRCA_countsMeta$counts

# Remove the duplicated gene names 
rs = rownames(count.raw)[!is.na(rownames(count.raw))]
count.mtx = count.raw[rs,]

# Remove genes with no expression across samples 
count.mtx = count.mtx[rowSums(count.mtx) !=0,]
meta = TCGA_BRCA_countsMeta$meta
meta = meta[colnames(count.mtx),]
library(SummarizedExperiment)
library(DESeq2)
se <- SummarizedExperiment(as.matrix(count.mtx), 
                           colData=meta)
dds <- DESeqDataSet(se, ~ 1)
vsd <- vst(dds, blind=FALSE)

pcaData <- DESeq2::plotPCA(vsd, intgroup = "sample_type", returnData = TRUE)
pcaData$group = dds$group
PCA_var=attr(pcaData, "percentVar")

PCA plot

ggplot(pcaData, aes(x = PC1, y = PC2, fill = sample_type)) +
  geom_point(size = 2, alpha = 0.8, shape = 21, color = "black", stroke = 0.2)  +
  # ggrepel::geom_text_repel(aes(label=name), 
  #                          color="grey6", size=3, hjust= -0.3, vjust=-0.3) +
  labs(x = paste("PC1: ", round(100 * PCA_var[1]), "% variance"),
       y = paste("PC2: ", round(100 * PCA_var[2]), "% variance")) +
  theme_bw() +
  theme(legend.title = element_blank()) +
  ggtitle("PCA") +
  labs(caption = " ")

PCA plot version2

# sample_type의 갯수 계산
sample_type_counts <- table(pcaData$sample_type)

# 플롯 생성
p <- ggplot(pcaData, aes(x = PC1, y = PC2, fill = sample_type)) +
  geom_point(size = 2, alpha = 0.8, shape = 21, color = "black", stroke = 0.2) +
  # ggrepel::geom_text_repel(aes(label = name), 
  #                          color = "grey6", size = 3, hjust = -0.3, vjust = -0.3) +
  labs(x = paste("PC1: ", round(100 * PCA_var[1]), "% variance"),
       y = paste("PC2: ", round(100 * PCA_var[2]), "% variance")) +
  theme_bw() +
  theme(legend.title = element_blank()) +
  ggtitle("PCA") +
  labs(caption = " ")

# sample_type 갯수 추가
count_text <- paste(names(sample_type_counts), sample_type_counts, sep = ": ", collapse = "\n")

# 플롯에 텍스트 추가
p + annotate("text", x = Inf, y = Inf, label = count_text, hjust = 1, vjust = 1, size = 3, color = "black", angle = 0, fontface = "bold")

Differentially Expressed Genes (DEG) analysis

TCGA_BRCA_tpmsMeta = readRDS(paste0(dir,"TCGA_BRCA_tpmsMeta.rds"))
tpms = TCGA_BRCA_tpmsMeta$counts

DEG strategy

Differentially Expressed Genes (DEG) between Tumor tissue and Normal tissue

Tumor tissue/Normal tissue

n = 1224
Tumor n = 1111
Normal n = 113




Create DEG object

# Generate info table
info <- data.frame(matrix(nrow = ncol(count.mtx), ncol = 2))
colnames(info) <- c('sample', 'cond')
info$sample <- colnames(count.mtx)
info$cond <- dds$sample_type
info$cond <- factor(info$cond, 
                    levels = c("Solid Tissue Normal","Primary Tumor")) # CTL going first
# levels(info$cond)

# DESeq
dds <- DESeqDataSetFromMatrix(count.mtx, info, ~ cond)
dds <- DESeq(dds) 
# dds %>% saveRDS(paste0(dir,"TCGA_BRCA_countsMeta.dds.rds"))
res <- results(dds)
res <- data.frame(res)
# The Previous work was saved and read to here to save run time. 
dds = readRDS(paste0(dir,"TCGA_BRCA_countsMeta.dds.rds"))
res <- results(dds)
res <- data.frame(res)

Add DEG information

# Add DEG information 
fc = 2
pval = 0.05

# res = res %>% mutate(DE=ifelse(log2FoldChange >= log2(fc) & pvalue < pval, 'UP',
#                                ifelse(log2FoldChange <= -log2(fc) & pvalue < pval, 'DN','no_sig')))

res = res %>% mutate(DE=ifelse(log2FoldChange >= log2(fc) & padj < pval, 'UP',
                               ifelse(log2FoldChange <= -log2(fc) & padj < pval, 'DN','no_sig')))
res = na.omit(res)

DEG table

res %>% DT::datatable()

Volcanoplot

res$DE = factor(res$DE, levels = c('UP','DN','no_sig'))
res %>% 
  ggplot(aes(log2FoldChange, -log10(padj), color=DE)) + 
  geom_point(size=1, alpha=0.5) + 
  scale_color_manual(values = c("red3","royalblue3","grey"), guide = FALSE) +
  theme_classic() +
  geom_vline(xintercept = c(-log2(fc),log2(fc)), color='grey') +
  geom_hline(yintercept = -log10(0.05),color='grey') +
  guides(colour = guide_legend(override.aes = list(size=5))) +
  ggtitle(paste0(levels(dds$cond)[2], " / ", levels(dds$cond)[1] )) +
  ggeasy::easy_center_title() ## to center title

Volcanoplot with Number of DEGs

t= paste0(levels(dds$cond)[2], " / ", levels(dds$cond)[1] )
up = nrow(res[res$DE == "UP", ])
dn = nrow(res[res$DE == "DN", ])
res %>% ggplot(aes(log2FoldChange, -log10(padj), color=DE)) + 
  geom_point(size=0.5, shape=19, alpha=0.7) +
  geom_vline(xintercept = c(-log2(fc), log2(fc)), size=0.1, color="grey") +
  geom_hline(yintercept = -log10(0.05), size=0.1, color="grey") +
  scale_color_manual(values = c("red3","royalblue3","grey"), guide = FALSE) +
  theme_bw() +
  annotate("text", x = Inf, y = Inf, label = paste0("UP: ", up), 
           hjust = 1.1, vjust = 2, size = 5, color = "red") +
  annotate("text", x = -Inf, y = Inf, label = paste0("DN: ", dn), 
           hjust = -0.1, vjust = 2, size = 5, color = "royalblue") +
  theme_bw() + ggtitle(t)

Number of DEGs

res %>% filter(DE != "no_sig") %>% 
  ggplot(aes(DE, fill=DE)) + geom_bar(color="black", size=0.2) +
  geom_text(stat = 'count', aes(label = ..count..), vjust = -0.1, size= 4, color=c("salmon","royalblue")) +
  scale_fill_manual(values = c("salmon", "royalblue"), guide=F) +
  theme_bw()

GSEA

GSEA HALLMARK

library(clusterProfiler)
hallmark <- msigdbr::msigdbr(species = "Homo sapiens", category = "H") %>% 
  dplyr::select(gs_name, gene_symbol)

perform_GSEA <- function(res, ref, pvalueCutoff = 1) {
  ranking <- function(res) {
    df <- res$log2FoldChange
    names(df) <- rownames(res)
    df <- sort(df, decreasing = TRUE)
    return(df)
  }
  
  ranked.res <- ranking(res)
  
  x <- clusterProfiler::GSEA(geneList = ranked.res,
                             TERM2GENE = ref,
                             pvalueCutoff = pvalueCutoff,
                             pAdjustMethod = "BH",
                             verbose = TRUE,
                             seed = TRUE)
  
  result <- x@result %>% arrange(desc(NES))
  result <- result[, c('NES', 'pvalue', 'p.adjust', 'core_enrichment', 'ID')]
  return(result)
}

# Application 
gsea.res = perform_GSEA(res = res, ref = hallmark) 
filtered_gsea = gsea.res %>% mutate(sig= ifelse(pvalue <= 0.05,"p value <= 0.05", "p value > 0.05"))

# Modified GSEA NES plot 
gsea_nes_plot =function(gsea.res, title, fontsize.x = 5, fontsize.y = 6){
  gsea.res %>% ggplot(aes(reorder(ID, NES), NES)) +
    geom_col(aes(fill=sig), color="grey1", size=0.2) +
    coord_flip() +
    labs(x="Pathway", y="Normalized Enrichment Score",
         title= "GSEA") + 
    theme_classic() +
    # scale_fill_gradient(low = '#FF0000', high = '#E5E7E9') +
    scale_fill_manual(values = c("#FF0000","grey88")) +
    theme(axis.text.x= element_text(size=fontsize.x, face = 'bold'),
          axis.text.y= element_text(size=fontsize.y, face = 'bold'), 
          axis.title =element_text(size=10)) +ggtitle(title)
}

GSEA NES plot

t= paste0("DEGs from ",levels(dds$cond)[2], " / ", levels(dds$cond)[1] )
gsea_nes_plot(filtered_gsea, title = t, fontsize.x = 8, fontsize.y = 8)

GSEA NES plot (filtered)

filtered_gsea = gsea.res %>% mutate(sig= ifelse(p.adjust <= 0.05,"p value <= 0.05", "p value > 0.05"))
filtered_gsea = filtered_gsea %>% filter(sig == "p value <= 0.05")
# filtered_gsea = filtered_gsea %>% distinct(core_enrichment, .keep_all = T)

t= paste0("DEGs from ",levels(dds$cond)[2], " / ", levels(dds$cond)[1] )
gsea_nes_plot(filtered_gsea, title = t, fontsize.x = 8, fontsize.y = 8)

GSEA table

filtered_gsea %>% DT::datatable()

GSEA KEGG

library(clusterProfiler)
kegg <- msigdbr::msigdbr(species = "Homo sapiens", subcategory = "CP:KEGG") %>% 
  dplyr::select(gs_name, gene_symbol)

perform_GSEA <- function(res, ref, pvalueCutoff = 1) {
  ranking <- function(res) {
    df <- res$log2FoldChange
    names(df) <- rownames(res)
    df <- sort(df, decreasing = TRUE)
    return(df)
  }
  
  ranked.res <- ranking(res)
  
  x <- clusterProfiler::GSEA(geneList = ranked.res,
                             TERM2GENE = ref,
                             pvalueCutoff = pvalueCutoff,
                             pAdjustMethod = "BH",
                             verbose = TRUE,
                             seed = TRUE)
  
  result <- x@result %>% arrange(desc(NES))
  # result <- result[, c('NES', 'pvalue', 'p.adjust', 'core_enrichment', 'ID')]
  return(result)
}

# Application 
gsea.res = perform_GSEA(res = res, ref = kegg) 

GSEA NES plot

filtered_gsea = gsea.res %>% mutate(sig= ifelse(p.adjust <= 0.05,"p value <= 0.05", "p value > 0.05"))
# filtered_gsea = filtered_gsea %>% filter(sig == "p value <= 0.05")
# filtered_gsea = filtered_gsea %>% distinct(core_enrichment, .keep_all = T)

t= paste0("DEGs from ",levels(dds$cond)[2], " / ", levels(dds$cond)[1] )
gsea_nes_plot(filtered_gsea, title = t, fontsize.x = 8, fontsize.y = 6)

GSEA NES plot (filtered)

filtered_gsea = gsea.res %>% mutate(sig= ifelse(p.adjust <= 0.05,"p value <= 0.05", "p value > 0.05"))
filtered_gsea = filtered_gsea %>% filter(sig == "p value <= 0.05")
# filtered_gsea = filtered_gsea %>% distinct(core_enrichment, .keep_all = T)

t= paste0("DEGs from ",levels(dds$cond)[2], " / ", levels(dds$cond)[1] )
gsea_nes_plot(filtered_gsea, title = t, fontsize.x = 8, fontsize.y = 8)

GSEA table

filtered_gsea[, c("NES","p.adjust","core_enrichment","sig")] %>% DT::datatable()

ssGSEA

ssGSEA by HALLMARK

gs_names = hallmark$gs_name %>% unique()

hallmakrList = list()

# 리스트 초기화
hallmarkList <- list()

# 반복문을 사용하여 각 고유 gs_name에 대해 필터링 및 리스트 생성
for (i in seq_along(gs_names)) {
  hallmarkList[[gs_names[i]]] <- hallmark %>% 
    filter(gs_name == gs_names[i]) %>% 
    dplyr::select(gene_symbol) %>% 
    pull()
}
## Perform ssgsea 
library(corto)

## Input data : tpm (count.mtx is accepted as well) 
test = ssgsea(tpms,hallmarkList)

## Reshape it to plot 
test.df = test %>% t() %>% data.frame()
rownames(test.df) = colnames(tpms)

## p value of ssgsea 
pval = corto::z2p(test)
colnames(pval) = rownames(test.df)
meta = colData(dds) 
anno.col = meta[,"cond"] %>% as.data.frame()
rownames(anno.col) = rownames(meta)
colnames(anno.col) = "sample"
anno.col = anno.col %>% arrange(sample)

## Heatmap 
my.color=c(colorRampPalette(colors = c("#2874A6","white"))(70),
           colorRampPalette(colors = c("white","#D35400"))(70))
t(test.df[rownames(anno.col),]) %>% pheatmap::pheatmap(color = my.color, 
                                  cluster_rows = T,
                                  cluster_cols = F,
                                  main = "ssGSEA of HALLMARK pathways",
                                  show_rownames = T,
                                  show_colnames = F,
                                  annotation_col = anno.col,
                                  fontsize_row = 7) # Simple heatmap 

ssGSEA by HALLMARK version 2

## Heatmap 
my.color=c(colorRampPalette(colors = c("#2874A6","white"))(70),
           colorRampPalette(colors = c("white","#D35400"))(70))
t(test.df[rownames(anno.col),]) %>% pheatmap::pheatmap(color = my.color, 
                                  cluster_rows = T,
                                  cluster_cols = T,
                                  main = "ssGSEA of HALLMARK pathways",
                                  show_rownames = T,
                                  show_colnames = F,
                                  annotation_col = anno.col,
                                  fontsize_row = 5) # Simple heatmap 

ssGSEA by KEGG

gs_names = kegg$gs_name %>% unique()

geneList = list()

# 리스트 초기화
geneList <- list()

# 반복문을 사용하여 각 고유 gs_name에 대해 필터링 및 리스트 생성
for (i in seq_along(gs_names)) {
  geneList[[gs_names[i]]] <- kegg %>% 
    filter(gs_name == gs_names[i]) %>% 
    dplyr::select(gene_symbol) %>% 
    pull()
}
## Perform ssgsea 
library(corto)

## Input data : tpm (count.mtx is accepted as well) 
test = ssgsea(tpms,geneList)

## Reshape it to plot 
test.df = test %>% t() %>% data.frame()
rownames(test.df) = colnames(tpms)

## p value of ssgsea 
pval = corto::z2p(test)
colnames(pval) = rownames(test.df)
## Heatmap 
my.color=c(colorRampPalette(colors = c("#2874A6","white"))(70),
           colorRampPalette(colors = c("white","#D35400"))(70))
t(test.df[rownames(anno.col),]) %>% pheatmap::pheatmap(color = my.color, 
                                  cluster_rows = T,
                                  cluster_cols = F,
                                  main = "ssGSEA of KEGG pathways",
                                  show_rownames = T,
                                  show_colnames = F,
                                  annotation_col = anno.col,
                                  fontsize_row = 5) # Simple heatmap 

ssGSEA by KEGG version 2

## Heatmap 
my.color=c(colorRampPalette(colors = c("#2874A6","white"))(70),
           colorRampPalette(colors = c("white","#D35400"))(70))
t(test.df[rownames(anno.col),]) %>% pheatmap::pheatmap(color = my.color, 
                                  cluster_rows = T,
                                  cluster_cols = T,
                                  main = "ssGSEA of KEGG pathways",
                                  show_rownames = T,
                                  show_colnames = F,
                                  annotation_col = anno.col,
                                  fontsize_row = 5) # Simple heatmap 




More meta data information

Morphology Codes

  1. 8010/3 (Carcinoma, NOS - Not Otherwise Specified):
    • A general form of cancer used when a specific subtype cannot be further specified.
  2. 8013/3 (Large Cell Neuroendocrine Carcinoma):
    • Cancer composed of large, well-differentiated neuroendocrine cells. Affects the neuroendocrine system.
  3. 8022/3 (Pleomorphic Carcinoma):
    • Cancer with cells of various shapes and sizes. Often aggressive and associated with a poor prognosis.
  4. 8050/3 (Papillary Carcinoma):
    • Cancer that forms small finger-like projections. Found in thyroid as well as breast cancer.
  5. 8090/3 (Basal Cell Carcinoma):
    • Cancer originating from basal cells, primarily a form of skin cancer.
  6. 8200/3 (Adenoid Cystic Carcinoma):
    • Cancer with gland-like structures and cystic spaces. Primarily occurs in the salivary glands but can also appear in the breast.
  7. 8201/3 (Cribriform Carcinoma):
    • Cancer with a sieve-like appearance. A subtype of breast cancer, generally associated with a relatively good prognosis.
  8. 8211/3 (Tubular Adenocarcinoma):
    • Cancer with small tube-like structures. A form of breast cancer, generally associated with a relatively good prognosis.
  9. 8401/3 (Adenoid Cystic Carcinoma):
    • (Repeated, same as 8200/3)
  10. 8480/3 (Mucinous Adenocarcinoma):
    • Cancer that produces a large amount of mucus. Also known as mucinous carcinoma, it can have a relatively good prognosis.
  11. 8500/3 (Infiltrating Ductal Carcinoma - IDC):
    • The most common form of breast cancer, which invades the ducts. Accounts for about 70-80% of all breast cancer cases.
  12. 8502/3 (Secretory Carcinoma):
    • Also known as secretory adenocarcinoma, a rare type of cancer. Typically found in younger patients.
  13. 8503/3 (Inflammatory Carcinoma):
    • An aggressive form of breast cancer that involves inflammation of the breast. Rapidly progressive with a poor prognosis.
  14. 8507/3 (Medullary Carcinoma):
    • Cancer characterized by clear boundaries and lymphocytic infiltration. Generally associated with a relatively good prognosis.

Pathological N Stages

meta$ajcc_pathologic_n

  1. N0: No regional lymph node metastasis.
    • N0 (i-): No immunohistochemically detectable micro-metastases.
    • N0 (i+): Immunohistochemically positive micro-metastases present.
    • N0 (mol+): Molecularly positive micro-metastases present.
  2. N1: Metastases in 1 to 3 axillary lymph nodes.
    • N1a: Metastases in 1 to 3 axillary lymph nodes confirmed.
    • N1b: Lymph node metastases present, classified according to specific criteria.
    • N1c: Combination of N1a and N1b.
    • N1mi: Micrometastases present (no greater than 0.2 cm).
  3. N2: Metastases in 4 to 9 axillary lymph nodes.
    • N2a: Metastases in 4 to 9 axillary lymph nodes confirmed.
  4. N3: Metastases in 10 or more axillary lymph nodes, or meeting specific criteria.
    • N3a: Metastases in 10 or more axillary lymph nodes confirmed.
    • N3b: Lymph node metastases meeting specific criteria.
    • N3c: Internal mammary lymph node metastases present.
  5. NX: Regional lymph nodes cannot be assessed.

Pathological M Stages

meta$ajcc_pathologic_m

  1. cM0 (i+): No clinical evidence of distant metastases, but micro-metastases are detected.
    • Description: This stage indicates that while clinical assessments show no signs of distant metastases, immunohistochemical tests have detected micro-metastases.
  2. M0: No distant metastasis.
    • Description: This stage means that upon examination of the entire body, no distant metastases are found. It is the most common condition, with 1016 patients classified at this stage.
  3. M1: Presence of distant metastasis.
    • Description: This stage indicates that cancer has spread to other organs or locations. 24 patients are at this stage.
  4. MX: Distant metastasis cannot be assessed.
    • Description: This stage is used when it is not possible to assess whether there are distant metastases. 176 patients are classified at this stage.